Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis
نویسندگان
چکیده
We present an algorithm to disambiguate abbreviations in Medline abstracts using Support Vector Machines (SVM) and one sense per discourse hypothesis. In contrast to other work using SVM for natural language disambiguation which always depend on handcrafted training and testing data, the algorithm provided here automatically extracts the training and testing data through searching long form of abbreviation in the texts and using one sense per discourse hypothesis. In the phase of testing, we also use this hypothesis to unify the outputs of the classifier via majority voting. The results obtained in our experiments demonstrate that SVM is a promising technique for abbreviation disambiguation and using majority voting in the phase of testing can improve the accuracy from 82.35% to 84.31%.
منابع مشابه
Resolving abbreviations to their senses in Medline
MOTIVATION Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a document consists of retrieving its sense in use. Abbreviation resolution improves accuracy of document...
متن کاملAbbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis
Abbreviations are widely used in many languages and disambiguation of abbreviations is critical. In this research, a structured process that attempts to solve the problem of abbreviation ambiguity is presented. Various baseline methods have been explored, including context-related methods and statistical methods. Almost all methods are domain-independent and language independent. The applicatio...
متن کاملTranslation of Acronyms, Initialisms and Abbreviations (AIA) in Persian Political and Sport Journalistic Texts
The different writing systems of English and Persian makes translation of acronyms, initialisms and abbreviations challenging. This study aimed at finding which strategies were applied most frequently in translating acronyms, initialisms and abbreviations from English to Persian especially in journalistic texts. The study was done based n Descriptive Translation Study of Toury and strategies pr...
متن کاملAutomatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems
With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کامل